-
Notifications
You must be signed in to change notification settings - Fork 15
(DOCSP-50370): Create new LangChain self-query retrieval notebook #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything works, but I've got a couple of nits with re-declaring stuff we've already declared, and some of the filter results. Non-blocking comments below!
"from langchain_core.runnables import RunnablePassthrough\n", | ||
"from langchain_openai import ChatOpenAI\n", | ||
"\n", | ||
"llm = ChatOpenAI(model=\"gpt-4o\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: in the context of this notebook, we're re-declaring an llm
we already declared up above in ln 233. I'd probably omit this line, and omit the related import from langchain_openai import ChatOpenAI
in ln 343 above.
I also don't love re-declaring the retriever with one additional param. It would be great if we could set enable_limit
when we initially declare the retriever in ln 234, and then remove the re-initializing here.
It makes sense to have these things on a docs page if we want this to be a stand-alone code example, but here in the context of the notebook, it's not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
"id": "833d90d9", | ||
"metadata": {}, | ||
"source": [ | ||
"### Queries with filters" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got some query results that seem unrelated to the filter. i.e. for "toys", I got this document:
Document(id='685eaec1edc703d86a4c7201', metadata={'_id': '685eaec1edc703d86a4c7201', 'year': 1979, 'rating': 9.9, 'genre': 'science fiction'}, page_content='Three men walk into the Zone, three men walk out of the Zone')
For thriller and action, I got this document:
Document(id='685eaec1edc703d86a4c7203', metadata={'_id': '685eaec1edc703d86a4c7203', 'year': 1995, 'genre': 'animated', 'rating': 9.3}, page_content='Toys come alive and have a blast doing so')
I'm sure this is related to the limited amount of sample data we're providing, but it doesn't show the feature great to have these seemingly unrelated results being returned. I wonder if we want to add more sample data to show only obviously related results being retrieved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Added more data and improved the queries to produce so the outputs are more descriptive
DOCSP-50370